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The  work  in  fhis  report  focuses  on  an  analysis  of  the  National  Soil  Inventory  of  England  and  Wales.  The  aim 
was  to  compare  geostatistical  methods,  mainly  ordinary  kriging  and  factorial  kriging  and  wavelet  analysis,  on 
a  different  kind  of  data  from  imagery.  The  data  were  from  sampling  locations  on  a  5-km  grid.To  provide  an 
area  as  close  to  a  square  as  possible  for  the  wavelet  analysis,  just  over  3000  points  were  selected  from  the 
total -of  over  5000.  Two  variables  were  selected  for  analysis,  pH  and  zinc.  The  variogram:  of  pH  showed  that 
there  was  long-range  trend  in  the  data  which  meant  that  this  had  to  be  removed  for  the  geostatistical 
analysis.  Trend  makes  the  geostatistical  analysis  more  complex,  whereas  the  wavelet  analysis  is  not  affected 
by  it.  Zinc  was  markedly  skewed  and  the  data  were  transformed  to  common  logarithms  for  the  geostatistical 
analysis,  which  again  was  not  necessary  for  the  wavelet  analysis.  The  results  have  shown  some  interesting 
features.  There  appears  to  be  no  local  non-stationarity  in  these  data,  which  meant  that  kriging  performed 
better  than  the  wavelet  analysis  in  terms  of  the  distribution  of  the  errors  for  the  10-km  subsample.  However, 
for  the  40-km  subsample  the  wavelet  analysis  performed  better.  The  variograms  for  both  properties  were 
nested  and  the  short-range  variation  was  evident  in  the  high  frequency  wavelet  transform  for  the  20-km  grid. 
The  variogram  can  provide  a  guide  as  to  what  sampling  interval  should  be  focused  on  in  a  multiresolution 
analysis  using  wavelets. 
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A  geostatistical  and  wavelet  analysis  of  the  National  Soil  Inventory  of  England  and 
Wales 

Introduction 


It  was  decided  while  on  a  visit  to  TEC  by  Mr  W.  Clark  that  Mr  E.  Bosch  and  Dr  M.  A. 
Oliver  should  extend  the  comparison  of  the  geostatistical  and  wavelet  analysis  that  they 
had  done  on  the  Fort  A.  P.  Hill  SPOT  image  to  a  large  data  set  of  soil  information.  The 
reason  for  this  was  to  see  how  the  techniques  performed  when  the  initial  data  are  a 
sample  rather  than  complete  cover  as  in  the  image.  For  the  SPOT  image  we  had  full 
cover  of  pixel  information  which  we  then  sampled.  Kriging  and  the  low  frequency 
wavelet  coefficients  were  used  to  restore  the  data  that  had  been  removed  (see  report  ?? 
and  Oliver  et  al.,  2000).  For  the  soil  data  we  started  with  sample  information  and 
resampled  this  for  the  analyses.  The  wavelet  analysis  of  the  soil  data  was  done  by  Mr  E. 
Bosch  during  Dr  Oliver’s  visit  to  TEC  in  June  2000. 

The  Soil  Data 

The  soil  data  that  we  have  analysed  are  part  of  the  National  Soil  Inventory  (NSI)  of 
England  Wales,  which  was  carried  out  by  the  Soil  Survey  of  England  and  Wales  between 
1978  and  1983  (McGrath  &  Loveland,  1992).  The  aim  of  the  survey  was  to  provide  a 
record  of  the  soil  information  in  these  countries  and  both  toxicity  and  deficiency  of  some 
elements  of  the  soil  that  affect  both  grazing  animals  and  arable  crops  at  the  national  level. 
For  the  NSI  to  be  an  unbiased  estimate  of  the  distribution  of  types  of  land  and  their 
properties,  strict  protocols  were  applied  to  site  location  and  description,  soil  sampling 
strategy,  and  soil  profile  description.  This  was  very  unlike  the  practice  of  'free'  soil 
survey  which  is  commonly  used  to  produce  conventional  soil  maps  (Avery,  1987). 
Considerable  effort  also  went  into  quality  control  of  pre-treatment  and  analysis  of  the 
samples,  data  recording,  error  trapping  and  construction  of  the  database,  because  of  the 
number  of  samples  and  the  magnitude  of  the  subsequent  analytical  programme 
(Loveland,  1990;  McGrath  &  Loveland,  1992). 
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The  number  of  samples  was  restricted  to  those  falling  at  the  intersectoins  of  a  5-km 
orthogonal  grid.  The  sampling  grid  was  offset  1  km  north  and  east  of  the  origin  of  the 
Ordnance  Survey  National  Grid.  If  the  sampling  point  fell  on  anything  other  than  land, 
e.g.  on  a  road,  building,  water-body  etc.,  then  the  sampling  point  was  moved  100  m  north 
of  the  grid  node.  If  that  failed  to  locate  suitable  soil,  then  the  point  was  moved  100  m 
west  from  the  originally  intended  point.  This  process  was  repeated  in  steps  of  100  m  and 
200  m  from  the  grid  node,  in  the  order  north,  east,  south  and  west.  If  no  suitable  soil  was 
found  after  this  procedure,  then  the  site  was  abandoned  for  sampling  purposes,  although 
the  land-use  at  the  original  sampling  point  was  recorded  so  that  the  inventory  was 
complete  and  to  make  clear  the  reason  for  the  deviation.  If  a  new  sampling  point  was 
found,  then  the  standard  procedure  for  description  and  sampling  was  followed  at  that 
point  (see  below).  In  this  way,  an  unbiased  record  of  the  occurrence  of  various  forms  of 
land-use  was  maintained. 

The  principal  interest  was  in  agricultural  land.  No  attempt  was  made  to  devise  a  sampling 
strategy  to  cover  urban  areas  adequately.  In  total  5691  sites  were  sampled.  The  grid- 
reference  located  the  site  to  within  1 0m  on  the  ground,  i.e.  to  an  accuracy  which  would 
place  any  return  visit  within  the  original  soil  sampling  sub-grid  (see  below). 

Sampling 

The  soil  profile  was  described  in  a  pit  dug  to  80  cm  (or  less  if  rock  was  encountered)  at 
each  sampling  point,  using  standard  terminology  (Hodgson,  1974).  However,  sampling 
was  restricted  to  the  uppermost  15  cm  of  mineral  soil  (or  less  if  rock  intervened),  or  of 
peat,  as  appropriate,  i.e.  litter  layers  were  not  sampled,  as  they  were  regarded  as 
ephemeral.  The  actual  sampling  depth  was  recorded.  Twenty-five  cores  of  soil  were  taken 
at  the  nodes  of  a  4m  grid  within  a  20  m  x  20  m  square  centred  on  the  Ordnance  Survey 
(OS)  5-km  grid-point.  The  cores  were  taken  with  a  screw-type,  mild-steel  auger,  to  avoid 
contamination  from  traces  of  elements  such  as  chromium  and  manganese  present  in 
stainless,  plated  or  similar  special  steels.  The  cores  of  soil  were  bulked  and  mixed  well  in 
the  field  and  double-bagged,  in  food-grade  polythene  bags,  and  a  waterproof  and  rot- 
proof  label  ('Synteape')  placed  between  the  bags. 
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Samples  were  air-dried  and  milled  in  a  mild-steel  roller-mill  (Waters  &  Sweetman,  1955) 
to  pass  a  2-mm  aperture  sieve.  Preliminary  work  had  shown  that  no  detectable 
contamination  of  the  samples  arose  from  this  procedure.  The  resulting  data  set  comprises 
up  to  127  analytical  and  descriptive  parameters  for  each  of  5691  points  across  England 
and  Wales  (Loveland,  1990;  McGrath  &  Loveland,  1992).  This  collection  of  data  is  a 
unique  and  invaluable  resource, 

In  this  analysis  we  have  examined  only  pH  and  total  Zinc  because  they  represent  other 
variables  well.  From  a  principal  components  analysis  (PCA)  Zn  was  seen  to  load  heavily 
on  the  first  component  and  pH  on  the  third  axis,  which  reflected  the  effects  of  pareent 
material  (PM)  and  leaching.  The  pH  was  measured  by  a  combination  electrode  and  pH 
meter  in  a  1:2.5  soil-water  suspension  (MAFF,  1986)  on  soil  <2-mm.  Zinc  was 
determined  using  the  <150  micrometre  soil.  It  was  extracted  by  aqua  regia,  and 
determined  by  ICP-AES  (RES)  (McGrath  &  Cunliffe,  1985). 


Analysis 

Figures  la  and  2  a  show  the  full  set  of  pixel  information  on  the  5-km  grid  for  pH  and  Zn. 
(They  have  a  different  colour  scale  from  the  remaining  maps  because  they  were  prepared 
on  a  different  computer  for  the  Ministry  of  Agriculture  analysis.  Nevertheless  the 
variation  can  be  compared  and  the  relative  area  that  we  have  worked  on).  The  analysis  for 
this  project  was  carried  out  on  a  subset  of  the  full  NSI  data.  This  was  because  the  wavelet 
analysis  requires  a  set  of  data  that  is  square  and  can  be  sampled  in  octaves.  The  outline  of 
the  data  for  England  and  Wales  is  irregular  and  we  selected  data  from  the  central  part  of 
the  country  to  obtain  as  large  a  square  as  possible  that  would  suit  the  needs  of  the 
analysis.  This  resulted  in  a  data  set  with  3500  sites.  For  the  wavelet  analysis  the  data  were 
‘padded’  with  zeros  so  that  there  were  no  gaps.  The  latter  arose  because  of  the  shape  of 
the  coastline  and  the  urban  areas  within  the  country  that  were  not  sampled  (see  Figures  la 
and  2a);  they  appear  as  white  patches.  Figures  lb  and  2b  show  the  data  that  were 
extracted  for  the  analysis  in  this  report. 


Figure  1 .  Raw  data  for  pH 
a)  Original  data  for  pH 
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Table  1  gives  the  summary  statistics  of  the  subset  of  the  data.  Zinc  was  strongly 
positively  skewed  which  can  be  seen  from  Table  1  and  Figure  3  a,  therefore  it  was 
transformed  to  common  logarithms  (logi0),  Figure  3  b.  This  transformation  has  produced 
a  log-normal  distribution  which  is  common  for  many  elements,  Figure  3  b  and  Table  1. 
The  histogram  for  pH  shows  that  this  has  a  close  to  normal  distribution,  Table  1.  A  near¬ 
normal  distribution  is  necessary  for  the  variogram  analysis  because  it  is  based  in 
variances,  which  are  unstable  if  the  data  do  not  have  a  near  normal  distribution. 


Figure  3.  Histograms  from  the  subset  of  the  NSI  data  for  Zinc  and  pH. 
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The  data  on  the  5-km  grid  were  further  subsampled  to  compare  the  results  of  data 
reconstruction  by  both  kriging  and  wavelet  analysis.  The  subsampling  produced  grids  of 
10-km  (1  site  in  every  block  of  4  sites  resulting  in  869  sites),  20-km  (lsite  in  every  block 
of  16  sites  resulting  in  219  sites),  and  40-km  (1  site  in  every  block  of  64  sites  resulting  in 
57  sites). 
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Table  1 .  Summary  statistics  for  pH  and  Zinc  for  the  subset  of  data  used  in  the  analysis. 
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GEOSTATISTICAL  ANALYSIS 
Variogram  analysis 

The  spatial  structure  in  the  data  was  determined  by  computing  the  experimental 
variograms  of  pH  and  logioZn.  For  the  foil  set  of  data  Zn  and  pH  showed  no  marked 
evidence  of  anisotropy,  therefore  omnidirectional  variograms  only  were  computed.  For 
both  the  foil  data  and  the  subset  the  experimental  variogram  of  pH  showed  evidence  of 
trend.  The  semivariances  continued  to  increase  after  an  initial  sill  had  been  reached 
(Figure  4  a).  This  suggests  the  presence  of  smooth  continuous  variation  that  violates  the 
assumptions  of  geostatistics,  which  assumes  that  the  variable  is  random.  Therefore,  we 
modelled  the  trend  by  linear  and  quadratic  functions  of  the  co-ordinates  so  that  the 
analysis  could  be  done  on  the  residuals  from  the  trend.  The  linear  function  was  less 
effective  in  accounting  for  the  trend  than  the  quadratic  one:  the  latter  removed  over  30% 
of  the  trend  in  both  cases.  The  variogram  was  then  computed  afresh  on  the  residuals,  and 
this  now  shows  a  more  simple  bounded  form.  Figure  4  b. 

Most  of  the  variograms  were  fitted  by  nested  functions.  The  models  fitted  to  the  data 
included  single  exponential,  spherical,  and  power  functions  including  linear,  double 
exponential  and  spherical,  and  exponential  with  linear  functions.  For  logioZn  (Figure  4  c) 
and  pH  of  the  residuals  double  spherical  models  provided  the  best  fit.  The  equations  for 
the  models  are  given  below. 

Double  exponential 

y(h)  =  c0  +  c,  {1  -  exp {-h  //•,)}  +  c2  {1  -  exp {-h  /  r2 )} 

where  cx  and  rx  are  the  sill  and  distance  parameter  of  the  first  structure,  and  c2  and  r2  are 
the  sill  and  distance  parameter  of  the  second  structure. 


variance 
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Double  spherical 
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where  ci  and  a,  are  the  sill  and  distance  parameter  of  the  first  structure,  and  c2  and  a2  are 
the  sill  and  distance  parameter  of  the  second  structure. 

Figure  4.  Experimental  variograms  (symbols)  and  fitted  models  (lines):  a)  raw  values  of 
pH,  b)  pH  residuals,  and  c)  logioZn. 
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c)  LogioZn 


b)  pH  residuals 
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These  results  show  that  there  are  two  main  scales  of  spatial  variation:  a  short-range 
component  of  about  18  km  for  logi0Zn  and  37  km  for  pH  (residuals),  and  a  long-range 
component  of  61  km  for  logioZn  and  1 18  km  for  pH  (residuals).  The  average  short-range 
component  for  the  full  data  for  the  range  of  properties  examined  was  24  km,  and  the 
average  of  the  long-range  component  was  89  km.  There  are  slight  differences  in  the 
ranges  for  these  subsets,  but  they  are  within  similar  orders  of  magnitude.  A  characteristic 
of  the  variograms  of  the  subset  and  of  the  full  data  is  their  large  nugget  variance  (c0):  it  is 
more  than  60%  of  the  sill  variance  for  most  variables.  Most  of  the  nugget  variance  can  be 
accounted  for  by  variation  over  distances  less  than  the  sampling  interval  of  the  grid.  This 
shows  that  the  5-km  grid  interval  misses  a  considerable  proportion  of  the  variation  in  the 
soil. 

Figure  1  shows  the  pixel  map  of  pH.  There  are  two  spatial  scales  of  variation  evident  in 
the  map.  Areas  with  a  pH  of  less  than  6  are  in  the  western  part  of  the  country  in  the  main, 
which  is  also  where  the  main  uplands  are,  and  where  agriculture  is  dominated  by 
grassland  systems  -  optimum  pH  between  5  and  6..  These  are  also  the  wettest  parts  of 
England  and  Wales.  Much  of  central  and  eastern  England  has  pH  values  of  7  and  above, 
partly  reflecting  geology  and  the  distribution  of  calcareous  soils,  but  also  the  widespread 
use  of  lime  on  arable  soils  (optimum  pH  c.  6.5  -  7.5).  There  are  areas  of  lower  pH  in  the  S 
associated  with  the  Tertiary  sands  and  gravels.  The  E-W  differences  in  pH  values  reflect 
the  pattern  of  rainfall  as  well  as  elevation  and  land-use.  Figure  2  shows  the  original 
values  as  a  pixel  map  for  total  Zn.  There  are  many  areas  with  large  concentrations,  and 
the  most  extensive  of  these  follows  the  Jurassic  clay  band  from  SW  to  NE  across  the 
country.  There  are  other  areas  trending  N  to  S  from  the  Midlands  of  England  to 
Tynemouth  (not  on  the  subset  map).  These  seem  to  be  associated  with  the  Carboniferous 
shales  and  sandstones,  as  do  the  areas  of  large  concentrations  in  central  Wales. 

Kriging 

Ordinary  kriging  and  factorial  kriging  (kriging  analysis)  have  been  described  in  earlier 
reports  (Contract  Nos.  N68171-97-C-9029;  N68171-98-M-5311).  Ordinary  kriging  was 
used  to  reconstruct  the  data  after  subsampling  them  to  produce  smaller  data  sets.  Punctual 
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kriging  was  used  so  that  the  estimates  and  maps  could  be  compared  with  predictions  from 
subsets  of  the  data.  The  estimation  grid  was  chosen  to  coincide  with  the  5-km  samplmg 
grid.  Estimates  were  made  at  the  nodes  of  this  grid  so  that  we  could  compare  the  kriged 
estimates  at  the  sampling  points  with  the  original  values  where  these  had  been  removed. 
At  the  places  where  there  were  data  punctual  kriging  returns  the  sample  value  there.  The 
original  variograms  were  used  for  the  analysis  because  it  is  unlikely  that  their  structure 
would  change  over  time.  In  addition  those  from  the  subsets  have  large  nugget  variances 
and  they  are  less  reliable  because  there  are  few  comparisons  for  each  semivariance, 
especially  for  the  40-km  grid. 


For  pH  ordinary  kriging  was  done  on  the  residuals  and  the  quadratic  trend  added  back  to 
the  estimated  residuals  afterwards. 

The  ordinary  kriged  logarithmically  transformed  predictions  for  Zinc  were  back- 
transformed  for  mapping  so  that  the  variation  could  be  seen  on  the  original  scale  in  which 
the  variable  had  been  measured.  This  is  not  straightforward  because  the  kriging  variance 
must  be  taken  into  account.  The  equation  for  back-transformation  is: 

Z  =  exp{F(x0)  x  InlO  +  0.5cr2  (x0)  x  (InlO)2} 

where  vt  \  is  the  estimated  value  of  logioZn  at  Xo  and  cf  y  is  the  estimation  variance. 

Factorial  kriging  was  done  on  the  full  set  of  data  to  examine  the  different  scales  of 
variation  in  the  data  and  to  compare  the  results  with  those  of  the  multi-resolution  wavelet 
analysis.  The  aim  is  to  filter  out  the  different  scales  of  variation,  so  that  the  independent 
components  of  the  spatial  structure  can  be  examined  as  an  aid  to  further  interpretation. 
Factorial  kriging  estimates  the  long-  and  short-range  components  separately.  The 
variation  is  nested  for  both  pH  and  Zn;  the  variograms  have  two  spatial  structures.  The 
pixel  maps  of  the  raw  data.  Figures  1  and  2  suggest  that  there  are  two  scales  of  variation, 
and  this  is  confirmed  by  the  variogram  results.  The  variation  at  the  longer  scale  appears 
to  be  related  to  the  geology  for  Zn  and  pH  and  also  rainfall  and  elevation  for  the  latter. 
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Zinc  is  a  good  example  of  many  of  the  other  variables  for  this  analysis.  For  the  long- 
range 

WAVELET  ANALYSIS 

The  method  for  this  analysis  was  described  in  a  previous  report  (N68171-98-M-5311)  and 
also  in  Oliver  et  al.  (2000)  for  SPOT  image  data.  This  is  the  first  analysis  that  we  are 
aware  of  using  soil  data  in  two  dimensions.  There  are  few  data  sets  in  the  world  for  soil 
that  are  on  a  grid  and  would  provide  adequate  data  for  this  analysis.  Wavelets  enable 
data  reconstruction  and  multi-resolution  analysis  by  deriving  the  low  frequency  and  high 
frequency  coefficients  from  the  data.  The  low  frequency  wavelet  transform  has  been  used 
to  restore  the  data  from  the  subsamples  on  the  original  5-km  grid  and  to  identify  the  long- 
range  spatial  component  at  the  coarser  resolutions.  The  average  of  the  high  frequency 
wavelet  transforms  has  been  used  to  identify  the  short-range  component.  The  advantage 
of  wavelet  analysis  at  the  outset  for  the  pH  data  is  that  there  is  no  need  to  take  account  of 
trend.  An  important  advantage  of  this  analysis  is  that  it  is  unaffected  by  non-stationarity. 


RESULTS  FOR  pH 

The  following  series  of  maps  (Figures  5  to  7)  shows  the  reconstructed  values  of  pH  from 
ordinary  kriging  and  the  low  frequency  wavelet  coefficients  for  the  three  sampling  grids. 
One  noticeable  difference  between  the  maps  is  that  the  kriged  maps  appear  more  ‘spotty’. 
This  is  because  kriging  returns  the  sample  value  at  the  data  point,  whereas  the  wavelet 
analysis  is  a  predicted  value  at  the  data  points  as  well  as  at  other  points.  Another 
difference  arises  from  the  fact  that  the  data  were  padded  for  the  wavelet  analysis  with 
zeros  -  these  are  the  larger  blue  areas  beyond  the  coastline  and  also  the  urban  areas  where 
there  were  no  sampling  locations.  Figure  5  b  for  the  data  on  a  10-km  grid  there  is  slightly 
more  of  the  original  detail  in  the  variation  evident,  whereas  the  kriged  map  (Figure  5  a) 
shows  the  effect  of  smoothing  from  kriging. 
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For  Figure  6  a  and  b  the  effects  of  the  greatly  reduced  number  of  sampling  sites  is  evident 
in  the  loss  of  detail.  For  Figure  6  a  the  margin  around  the  map  is  because  there  were  no 
data  there  to  krige  from.  Kriging,  Figure  6  a,  has  smoothed  the  variation  more  than 
wavelet  analysis.  Figure  6  b,  and  the  spotty  appearance  of  the  former  map  is  the  effect  of 
punctual  kriging.  Figure  7  a  and  b  shows  the  kriged  and  low  frequency  wavelet 
coefficients  for  the  40-km  grid.  It  is  clear  that  much  detail  has  been  lost  and  that  there  is 
more  difference  between  these  two  maps  than  between  those  in  Figures  5  and  6.  Visually 
the  wavelet  analysis  appears  to  have  performed  better  at  this  level  of  sampling  which  is 
what  we  found  for  the  image  data  (Oliver  et  al,  2000).  The  more  sparse  the  sampling  the 
better  the  wavelet  analysis  appears  to  perform  in  comparison  with  kriging. 

Figures  8  to  10  show  the  maps  of  the  comparisons  between  the  predictions  from  ordinary 
punctual  kriging  and  the  low  frequency  wavelet  transform,  and  the  original  values  at  the 
sampling  sites  of  the  5-km  grid.  Figure  8  a  and  b  show  the  comparisons,  i.e.  the  absolute 
differences,  for  predictions  based  on  the  10-km  sampling  grid.  Figure  8  a  for  the  kriged 
comparisons  is  a  more  spotty  map  than  the  one  from  the  wavelet  analysis:  the  sampling 
points  are  evident  as  the  blue  pixels  where  there  is  no  error.  For  the  wavelet  analysis  for 
this  sampling  grid  there  are  fewer  zero  or  small  errors  than  for  kriging.  This  is  confirmed 
by  the  histograms  of  the  differences,  Figure  1 1  a  and  b.  Kriging  also  appears  to  perform 
better  for  predicting  the  values  from  the  20-km  grid  than  the  wavelet  analysis  in  terms  of 
the  small  errors.  Figure  9  a  and  b.  The  histogram  of  the  kriged  differences.  Figure  11  c, 
is  somewhat  misleading  because  there  are  fewer  comparisons  for  kriging  than  for  the 
wavelet  analysis,  and  it  is  likely  that  there  would  be  more  of  the  larger  errors  than  is 
evident  in  the  histogram.  The  slight  negative  skewness  in  this  histogram  suggests  that 
there  is  some  bias  in  the  predictions.  The  histograms,  Figure  1 1  c  and  d,  for  this  sampling 
grid  (20-km)  are  more  similar  than  for  the  10-km  one.  The  comparisons  for  the 
predictions  from  the  40-km  grid  suggest  that  the  wavelet  analysis  has  performed  slightly 
better  at  this  level  of  sampling,  which  was  the  case  for  the  SPOT  image  data  (Oliver  et 
al.,  2000).  These  histograms.  Figure  1 1  d  and  e  show  that  the  wavelet  analysis  has  more 
smaller  errors.  There  are  fewer  comparisons  for  kriging  because  the  method  requires  a 
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minimum  of  4  points  within  the  search  radius  and  this  fails  at  the  margins  of  the  error 
when  the  sampling  points  become  sparse. 

Summary 

These  results  are  interesting  when  compared  with  the  analysis  of  the  SPOT  data.  The  NSI 
data  appear  not  contain  locally  non-stationary  data  for  pH.  These  would  occur  where 
there  are  marked  boundaries  in  the  soil,  for  example.  At  the  sampling  interval  used  here 
of  5-km  local  non-stationarity  is  less  likely  than  for  more  intensively  sampled  data  and 
remotely  sensed  data.  Therefore,  the  errors  for  kriging  are  less  than  they  were  for  the 
analysis  of  the  SPOT  data  where  there  were  marked  changes  at  lakes  and  other 
boundaries  causing  local  non-stationarity.  Since  kriging  is  an  exact  interpolator  and 
wavelet  analysis  is  not,  there  remains  the  need  to  combine  the  methods.  It  seems  that 
some  progress  on  this  has  been  made  at  the  Centre  de  Geostatististique,  Fontainebleau. 
However,  at  the  moment  it  is  difficult  to  ascertain  the  extent  of  this  and  we  shall 
endeavour  to  take  this  forward. 

Another  point  of  interest  from  this  analysis  is  that  the  pH  data  form  the  NSI  survey 
contain  long  distance  trend.  This  means  that  part  of  the  variation  depends  on  the  spatial 
coordinates.  This  violates  the  assumptions  of  geostatistics  in  the  same  way  as  local  trend 
or  drift,  i.e.  local  non-stationarity.  This  affected  the  variogram,  as  was  evident  above. 
Figure  4  a  and  b,  and  meant  that  we  had  to  remove  the  trend  and  do  the  analysis  on  the 
residuals,  and  add  back  the  trend  after  kriging.  This  is  clearly  a  considerable  amount  of 
additional  effort  over  and  above  the  straightforward  analysis.  It  is  evident  from  the  results 
of  the  wavelet  analysis  that  the  prediction  are  unaffected  by  the  trend.  Therefore,  if  there 
is  a  choice  of  method  available  -  situations  with  known  trend  present  would  benefit  from 
the  wavelet  analysis. 


Figure  5.  Predictions  of  pH  at  a  5-km  interval  from  data  on  a  10-km  grid 


b)  pH  -  low  frequency  wavelet  transform  for  data  on  a  10-km  grid 


Figure  6.  Predictions  at  a  5-km  interval  from  data  on  a  20-km  grid 
a)  pH  -  kriged  estimates  from  data  on  a  20-km  grid 


b)  pH-  low  frequency  wavelet  transform  for  data  on  a  20-km  grid 


Figure  7.  Predictions  at  a  5-km  interval  from  data  on  a  40-km  grid. 


a)  pH  -  kriged  estimates  from  data  on  a  40-km  grid 


Figure  8.  Comparisons  between  estimates  from  data  on  a  10-km  grid 
a)  pH  -  comparisons  for  kriging 
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b)  pH  -  comparisons  for  wavelet  analysis 
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Figure  9.  Comparisons  for  estimates  from  data  on  a  20-km  grid 
a)  pH  -  comparisons  for  kriging 
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Figure  10.  Comparisons  for  estimates  on  a  40-km  grid 
a)  pH  -  comparisons  for  kriging 
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Figure  1 1 .  Histograms  of  the  differences  for  pH  from  kriging  and  wavelet  analysis, 
a)  Kriged  pH  sampled  at  1  in  4  b)  Wavelet  pH  sampled  at  1  in  4 
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d)  Kriged  pH  sampled  at  1  in  64 
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c)  Wavelet  pH  sampled  at  1  in  16 
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e)  Wavelet  pH  sampled  at  1  in  64 
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Results  of factorial  kriging  and  wavelet  analysis  for  pH 

Factorial  kriging  was  applied  to  the  data  on  the5-km  grid,  but  the  equivalent  analysis  for 
wavelets  was  done  on  all  of  the  subsamples..  The  reasons  for  this  were  given  in  the 
previous  final  report.  Figure  12  shows  the  long-range  component  from  kriging  analysis. 
The  results  for  all  of  England  and  Wales  are  given  at  the  end  of  the  report.  Figure  24. 


Figure  12.  Long-range  estimates  of  pH  from  kriging  analysis 
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The  map  of  the  long-range  estimates  for  pH  is  similar  to  the  kriged  estimates  from  the  20- 
km  grid,  they  are  not  as  similar  to  any  of  the  low  frequency  wavelet  predictions.  Figures 
5  to  7.  The  long-range  variation,  Figure  12  shows  that  the  larger  values  are  generally 
associated  with  the  lowland  areas  and  the  limestone  uplands.  However,  the  western 
coastal  areas  have  large  values  of  pH  which  are  most  probably  associated  with  the 
deposition  of  sodium  ions  by  rain  in  these  areas. 

Figure  13  b,  c  and  d  shows  the  high  frequency  wavelet  component  for  pH  from  the 
wavelet  analysis.  It  is  evident  that  the  result  for  the  20-km  grid  is  the  closest.  This  reflects 
the  same  resolution  for  extracting  the  long-range  component  also.  There  are  some 
similarities  in  the  detail  of  the  distributions,  but  there  are  also  differences.  In  the  future 
we  shall  examine  the  differences  between  these  particular  results  to  assess  their  relative 
performances  in  more  detail.  The  high  frequency  component  for  the  40-km  grid  has  not 
identified  the  relevant  short-range  component. 

Again  an  interesting  point  emerges  that  we  observed  in  the  previous  analysis  of  the  SPOT 
data.  The  level  of  resolution  at  which  the  wavelet  analysis  has  identified  the  long-  and 
short-range  components  of  the  variation  is  related  to  the  short-range  parameter  of  the 
variogram.  We  can  now  suggest  more  forcibly  that  for  a  multiresolution  analysis  using 
wavelets  the  best  approach  is  to  compute  the  variogram  first. 


Figure  13.  Short-range  variation  of  pH 

a)  Short-range  component  of  pH  from  kriging  analysis  on  the  5-km  grid 


b)  High  frequency  wavelet  coefficient  of  pH  from  data  on  the  1 0-km  grid 


Figure  14.  Short-range  variation  of  pH. 

a)  High  frequency  wavelet  coefficient  of  pH  fronrdata  orrthe-20-km  grid 
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RESULTS  FOR  ZINC 

Figure  2  a  and  b  show  the  original  values  for  total  logioZn.  There  are  many  areas  with 
large  concentrations,  and  the  most  extensive  of  these  follows  the  Jurassic  clay  band  from 
SW  to  NE  approximately.  There  are  other  areas  trending  N  to  S  from  the  Midlands  of 
England  to  the  north.  These  seem  to  be  associated  with  the  Carboniferous  shales  and 
sandstones,  as  do  the  areas  in  central  Wales.  The  large  values  around  Avonmouth  (SW) 
are  associated  with  the  smelting  industry  there. 

For  Zn  the  common  logarithms  were  analysed  and  the  values  back-transformed  for 
mapping  as  described  above.  Figure  15  a  and  b  shows  the  maps  of  the  predictions  using 
ordinary  punctual  kriging  and  the  low  frequency  wavelet  coefficients  for  data  on  the  10- 
km  grid.  The  results  are  similar.  The  spotty  appearance  of  the  kriged  map  arises  from  the 
fact  that  kriging  restores  the  data  at  the  sampling  points  with  non  error.  The  overall  result 
shows  that  kriging  smooths  more  than  the  wavelet  analysis.  Nevertheless  the  maps  are 
similar  to  those  for  the  original  data,  Figure  2  b. 

The  pattern  of  variation  in  the  estimates  from  the  20-km  grid  for  both  analyses  is  also 
preserved  well,  Figure  16  a  and  b.  The  degradation  in  detail  is  clear,  but  the  large-scale 
pattern  is  still  evident.  Again  the  results  for  both  methods  of  analysis  are  similar  -  more 
so  than  for  pH. 

Figure  17  a  and  b  shows  the  results  for  the  40-km  grid.  The  results  from  the  wavelet 
analysis,  although  showing  a  loss  of  detail,  still  show  a  similar  pattern  of  variation, 
Figure  17  b,  to  that  of  Figure  16  b.  The  kriged  results  so  not  show  such  a  good 
resemblance  to  the  original  pattern  of  variation.  Note  particularly  the  loss  of  accuracy  in 
the  north  western  part  of  the  country. 


27 


These  results  again  accord  with  the  findings  for  pH  and  for  the  analysis  of  the  SPOT 
image.  When  the  number  of  samples  is  few  and  the  distance  between  them  large  kriging 
restores  the  data  less  well  than  the  wavelet  analysis.  This  effect  is  supported  by  the  maps 
of  the  differences,  Figures  18  to  20  and  of  the  histograms.  Figure  21. 

Figures  18  to  20  show  the  maps  of  the  absolute  differences  between  the  predictions  from 
ordinary  punctual  kriging  and  the  low  frequency  wavelet  transform,  and  the  original 
values  at  the  sampling  sites  of  the  5-km  grid.  Figure  18  a  and  b  shows  the  comparisons, 
for  predictions  based  on  the  10-km  sampling  grid.  Figure  18  a  for  the  kriged  comparisons 
is  a  more  spotty  map  than  the  one  from  the  wavelet  analysis:  the  sampling  points  are 
evident  as  the  blue  pixels  where  there  is  no  error.  For  the  wavelet  analysis  for  this 
sampling  grid  there  are  fewer  zero  or  small  errors  than  for  kriging,  This  is  confirmed  by 
the  histograms  of  the  differences,  Figure  21a  and  b.  The  same  negative  skew  in  the  errors 
is  evident  for  Zn  as  for  pH.  Kriging  does  not  appear  to  have  performed  quite  as  well  for 
Zn  for  the  20-km  grid  as  the  wavelet  analysis  in  terms  of  the  small  errors.  Figure  21  c  and 
d.  The  maps  of  the  differences,  Figure  19  a  and  b  do  not  show  this  as  clearly.  Both 
methods  appear  to  have  performed  similarly  from  these  two  maps.  The  slight  negative 
skewness  in  this  histogram  again  suggests  that  there  is  some  bias  in  the  predictions.  The 
comparisons  for  the  predictions  from  the  40-km  grid  suggest  that  there  is  less  difference 
between  the  wavelet  analysis  and  kriging  than  the  maps  of  the  estimates  suggested  there 
would  be.  Figure  17.  The  histograms,  Figure  21  d  and  e  confirm  this,  although  direct 
comparison  is  not  possible  because  kriging  has  not  gone  to  the  edges  of  the  area  for  the 
reasons  given  before. 

Summary 

These  results  are  interesting  when  compared  with  the  analysis  of  the  SPOT  data.  The  NSI 
data  for  zinc  again  do  not  appear  not  contain  locally  non-stationary  data  as  for  pH.  This 
explains  the  somewhat  better  performance  of  kriging  for  the  10-km  grid. 

The  zinc  values  were  skewed  and  this  means  that  the  variances  when  computing  the 
variogram  are  unstable.  The  values  were  transformed  to  common  logarithms,  log10Zn, 
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and  kriging  was  performed  on  the  logarithms  and  these  values  were  back-transformed 
afterwards  so  that  the  values  could  be  shown  on  their  original  measurement  scale  as  for 
the  wavelet  analysis.  Again  this  is  clearly  involves  additional  effort  over  and  above  the 
wavelet  analysis,  which  does  not  require  non-normal  distributions  to  be  transformed.  This 
has  an  additional  advantage  because  the  transformation  causes  additional  smoothing  of 
the  predictions.  This  does  not  seem  to  be  particularly  evident  from  the  results  given  here. 


Figure  15.  Predictions  of  Zn  at  a  5-km  interval  from  data  on  a  10-km  grid 
a)Zn  -  kriged  estimates  from  data  orrthelO-km  grid 
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b)  Zn  -  low  frequency  wavelet  coefficients  from  data  on  the  10-km  grid 
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Figure  16.  Predictions  of  Zn  at  a  5-km  interval  from  data  on  a  20-km  grid 
a)  Zn  -  kriged  estimates  from  data  on  the  20-km  grid 


Figure  17.  Predictions  of  Zn  at  a  5-km  interval  from  data  on  a  40-km  grid 
a)  Zrr-  kriged  estimates  from  data  on  the  40-km  grid 


Figure  18.  Comparisons  for  estimates  on  a  10-km  grid 
a>Ztr  -  comparisons  for  krigtng 
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Figure  21.  Histograms  of  Zinc 
a)  Kriged  Zn  sampled  at  1  in  4 
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c)  Kriged  Zn  sampled  at  1  in  16 
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b)  Wavelet  Zn  sampled  at  1  in  4 
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c)  Wavelet  Zn  sampled  at  1  in  16 
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d)  Kriged  Zn  sampled  at  1  in  64 


e)  Wavelet  Zn  sampled  at  1  in  64 
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Results  of factorial  kriging  and  wavelet  analysis  for  pH 

Factorial  kriging  was  af 
wavelets  was  done  on  all  of  the  subsamples..  The  reasons  for  this  were  given  in  the 
previous-  final  report.  Figure- 22  shows  the  long-range  component  fronr  kriging- analysis 
on  the  logarithmic  scale.  These  results  were  not  back-transformed  because  of  the  way  in 
which  these-  estimates  are  derived:  The  long-range  kriged  estimates  for  logio  Zn-show  t^at 
the  largest  values  occur  near  to  the  Avonmouth  smelter  in  the  west  of  England,  and 
another  area- m- Derbyshire:  There  arc  large-values-  associated  with  the-Jurassie  el|ys 
trending  from  SW  to  NE,  to  the  Carboniferous  limestone  in  Derbyshire,  Carboniferous 
shales-  m  the  NE  and  Ordovkhan-  foek-a  m-Walesr  This  tfotribtrtion  has  sefn^stmHaritjes 
with  that  for  Cr.  The  values  for  logio  Zn  for  all  of  England  Wales  are  shown  in  the  end. 
These-resdts-show  the  closest  relations  with-the-20-km-grid-wavelet  analysis,  Figure^  16 
b,  even  though  the  colour  scales  appear  somewhat  different  because  Figure  22  is  for 
logarithms. 

Figure-22 


300  350-  400-  450-  500  550 


mm 

Above 

2.40 

in 

2.29  - 

2.40 

JM 

2.18  - 

2.29 

1  1 

2.G6  - 

2.18 

1 _ 1 

1.95  - 

2.06 

Hi 

1.84- 

1.95 

Hi 

1.75  - 

1.84 

H 

1.61  - 

1.73 

BBS 

1.50  — 

1.61 

Below 

1.50 

39 


Figure  23  a  shows  the  short-range  component  of  the  variation  from  kriging  analysis,  and 
the  maps  for  the  whole  of  England  and  Wales  for  this  analysis  are  given  at  the  end  of  the 
report.  Figure  25.  The  short-range  component  was  investigated  previously,  it  has  a  strong 
similarity  with  the  map  of  the  short-range  component  for  Cr  (not  shown).  These 
distributions  also  show  a  relation  with  the  small  scale  drainage  basins  and  local  changes 
in  rock  and  soil  types. 

Figure  23  b,  c  and  d  shows  the  high  frequency  wavelet  component  for  Zn  from  the 
wavelet  analysis.  It  is  evident  that  the  result  for  the  20-km  grid  is  the  closest  to  that  for 
the  short-range  component  from  kriging  analysis.  This  reflects  the  same  resolution  for 
extracting  the  long-range  component  also.  There  are  some  similarities  in  the  detail  of  the 
distributions,  but  there  are  also  differences.  In  the  future  we  shall  examine  the  differences 
between  these  particular  results  to  assess  their  relative  performances  in  more  detail.  The 
high  frequency  component  for  the  40-km  grid  has  also  identified  some  of  the  relevant 
short-range  component. 

Again  an  interesting  point  emerges  that  we  observe  above  is  that  the  level  of  resolution  at 
which  the  wavelet  analysis  has  identified  the  long-  and  short-range  components  of  the 
variation  is  related  to  the  short-range  parameter  of  the  variogram. 
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